{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fdfa896e",
   "metadata": {
    "origin_pos": 1
   },
   "source": [
    "# Linear Regression\n",
    ":label:`sec_linear_regression`\n",
    "\n",
    "*Regression* problems pop up whenever we want to predict a numerical value.\n",
    "Common examples include predicting prices (of homes, stocks, etc.),\n",
    "predicting the length of stay (for patients in the hospital),\n",
    "forecasting demand (for retail sales), among numerous others.\n",
    "Not every prediction problem is one of classical regression.\n",
    "Later on, we will introduce classification problems,\n",
    "where the goal is to predict membership among a set of categories.\n",
    "\n",
    "As a running example, suppose that we wish\n",
    "to estimate the prices of houses (in dollars)\n",
    "based on their area (in square feet) and age (in years).\n",
    "To develop a model for predicting house prices,\n",
    "we need to get our hands on data,\n",
    "including the sales price, area, and age for each home.\n",
    "In the terminology of machine learning,\n",
    "the dataset is called a *training dataset* or *training set*,\n",
    "and each row (containing the data corresponding to one sale)\n",
    "is called an *example* (or *data point*, *instance*, *sample*).\n",
    "The thing we are trying to predict (price)\n",
    "is called a *label* (or *target*).\n",
    "The variables (age and area)\n",
    "upon which the predictions are based\n",
    "are called *features* (or *covariates*).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "518bf6ae",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:22.515904Z",
     "iopub.status.busy": "2023-08-18T19:40:22.515596Z",
     "iopub.status.idle": "2023-08-18T19:40:25.958016Z",
     "shell.execute_reply": "2023-08-18T19:40:25.957007Z"
    },
    "origin_pos": 3,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import math\n",
    "import time\n",
    "import numpy as np\n",
    "import torch\n",
    "from d2l import torch as d2l"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5e9b15e",
   "metadata": {
    "origin_pos": 6
   },
   "source": [
    "## Basics\n",
    "\n",
    "*Linear regression* is both the simplest\n",
    "and most popular among the standard tools\n",
    "for tackling regression problems.\n",
    "Dating back to the dawn of the 19th century :cite:`Legendre.1805,Gauss.1809`,\n",
    "linear regression flows from a few simple assumptions.\n",
    "First, we assume that the relationship\n",
    "between features $\\mathbf{x}$ and target $y$\n",
    "is approximately linear,\n",
    "i.e., that the conditional mean $E[Y \\mid X=\\mathbf{x}]$\n",
    "can be expressed as a weighted sum\n",
    "of the features $\\mathbf{x}$.\n",
    "This setup allows that the target value\n",
    "may still deviate from its expected value\n",
    "on account of observation noise.\n",
    "Next, we can impose the assumption that any such noise\n",
    "is well behaved, following a Gaussian distribution.\n",
    "Typically, we will use $n$ to denote\n",
    "the number of examples in our dataset.\n",
    "We use superscripts to enumerate samples and targets,\n",
    "and subscripts to index coordinates.\n",
    "More concretely,\n",
    "$\\mathbf{x}^{(i)}$ denotes the $i^{\\textrm{th}}$ sample\n",
    "and $x_j^{(i)}$ denotes its $j^{\\textrm{th}}$ coordinate.\n",
    "\n",
    "### Model\n",
    ":label:`subsec_linear_model`\n",
    "\n",
    "At the heart of every solution is a model\n",
    "that describes how features can be transformed\n",
    "into an estimate of the target.\n",
    "The assumption of linearity means that\n",
    "the expected value of the target (price) can be expressed\n",
    "as a weighted sum of the features (area and age):\n",
    "\n",
    "$$\\textrm{price} = w_{\\textrm{area}} \\cdot \\textrm{area} + w_{\\textrm{age}} \\cdot \\textrm{age} + b.$$\n",
    ":eqlabel:`eq_price-area`\n",
    "\n",
    "Here $w_{\\textrm{area}}$ and $w_{\\textrm{age}}$\n",
    "are called *weights*, and $b$ is called a *bias*\n",
    "(or *offset* or *intercept*).\n",
    "The weights determine the influence of each feature on our prediction.\n",
    "The bias determines the value of the estimate when all features are zero.\n",
    "Even though we will never see any newly-built homes with precisely zero area,\n",
    "we still need the bias because it allows us\n",
    "to express all linear functions of our features\n",
    "(rather than restricting us to lines that pass through the origin).\n",
    "Strictly speaking, :eqref:`eq_price-area` is an *affine transformation* of input features, which is characterized by a *linear transformation* of features via a weighted sum, combined with a *translation* via the added bias.\n",
    "Given a dataset, our goal is to choose\n",
    "the weights $\\mathbf{w}$ and the bias $b$\n",
    "that, on average, make our model's predictions\n",
    "fit the true prices observed in the data as closely as possible.\n",
    "\n",
    "\n",
    "In disciplines where it is common to focus\n",
    "on datasets with just a few features,\n",
    "explicitly expressing models long-form,\n",
    "as in :eqref:`eq_price-area`, is common.\n",
    "In machine learning, we usually work\n",
    "with high-dimensional datasets,\n",
    "where it is more convenient to employ\n",
    "compact linear algebra notation.\n",
    "When our inputs consist of $d$ features,\n",
    "we can assign each an index (between $1$ and $d$)\n",
    "and express our prediction $\\hat{y}$\n",
    "(in general the \"hat\" symbol denotes an estimate) as\n",
    "\n",
    "$$\\hat{y} = w_1  x_1 + \\cdots + w_d  x_d + b.$$\n",
    "\n",
    "Collecting all features into a vector $\\mathbf{x} \\in \\mathbb{R}^d$\n",
    "and all weights into a vector $\\mathbf{w} \\in \\mathbb{R}^d$,\n",
    "we can express our model compactly via the dot product\n",
    "between $\\mathbf{w}$ and $\\mathbf{x}$:\n",
    "\n",
    "$$\\hat{y} = \\mathbf{w}^\\top \\mathbf{x} + b.$$\n",
    ":eqlabel:`eq_linreg-y`\n",
    "\n",
    "In :eqref:`eq_linreg-y`, the vector $\\mathbf{x}$\n",
    "corresponds to the features of a single example.\n",
    "We will often find it convenient\n",
    "to refer to features of our entire dataset of $n$ examples\n",
    "via the *design matrix* $\\mathbf{X} \\in \\mathbb{R}^{n \\times d}$.\n",
    "Here, $\\mathbf{X}$ contains one row for every example\n",
    "and one column for every feature.\n",
    "For a collection of features $\\mathbf{X}$,\n",
    "the predictions $\\hat{\\mathbf{y}} \\in \\mathbb{R}^n$\n",
    "can be expressed via the matrix--vector product:\n",
    "\n",
    "$${\\hat{\\mathbf{y}}} = \\mathbf{X} \\mathbf{w} + b,$$\n",
    ":eqlabel:`eq_linreg-y-vec`\n",
    "\n",
    "where broadcasting (:numref:`subsec_broadcasting`) is applied during the summation.\n",
    "Given features of a training dataset $\\mathbf{X}$\n",
    "and corresponding (known) labels $\\mathbf{y}$,\n",
    "the goal of linear regression is to find\n",
    "the weight vector $\\mathbf{w}$ and the bias term $b$\n",
    "such that, given features of a new data example\n",
    "sampled from the same distribution as $\\mathbf{X}$,\n",
    "the new example's label will (in expectation)\n",
    "be predicted with the smallest error.\n",
    "\n",
    "Even if we believe that the best model for\n",
    "predicting $y$ given $\\mathbf{x}$ is linear,\n",
    "we would not expect to find a real-world dataset of $n$ examples where\n",
    "$y^{(i)}$ exactly equals $\\mathbf{w}^\\top \\mathbf{x}^{(i)}+b$\n",
    "for all $1 \\leq i \\leq n$.\n",
    "For example, whatever instruments we use to observe\n",
    "the features $\\mathbf{X}$ and labels $\\mathbf{y}$, there might be a small amount of measurement error.\n",
    "Thus, even when we are confident\n",
    "that the underlying relationship is linear,\n",
    "we will incorporate a noise term to account for such errors.\n",
    "\n",
    "Before we can go about searching for the best *parameters*\n",
    "(or *model parameters*) $\\mathbf{w}$ and $b$,\n",
    "we will need two more things:\n",
    "(i) a measure of the quality of some given model;\n",
    "and (ii) a procedure for updating the model to improve its quality.\n",
    "\n",
    "### Loss Function\n",
    ":label:`subsec_linear-regression-loss-function`\n",
    "\n",
    "Naturally, fitting our model to the data requires\n",
    "that we agree on some measure of *fitness*\n",
    "(or, equivalently, of *unfitness*).\n",
    "*Loss functions* quantify the distance\n",
    "between the *real* and *predicted* values of the target.\n",
    "The loss will usually be a nonnegative number\n",
    "where smaller values are better\n",
    "and perfect predictions incur a loss of 0.\n",
    "For regression problems, the most common loss function is the squared error.\n",
    "When our prediction for an example $i$ is $\\hat{y}^{(i)}$\n",
    "and the corresponding true label is $y^{(i)}$,\n",
    "the *squared error* is given by:\n",
    "\n",
    "$$l^{(i)}(\\mathbf{w}, b) = \\frac{1}{2} \\left(\\hat{y}^{(i)} - y^{(i)}\\right)^2.$$\n",
    ":eqlabel:`eq_mse`\n",
    "\n",
    "The constant $\\frac{1}{2}$ makes no real difference\n",
    "but proves to be notationally convenient,\n",
    "since it cancels out when we take the derivative of the loss.\n",
    "Because the training dataset is given to us,\n",
    "and thus is out of our control,\n",
    "the empirical error is only a function of the model parameters.\n",
    "In :numref:`fig_fit_linreg`, we visualize the fit of a linear regression model\n",
    "in a problem with one-dimensional inputs.\n",
    "\n",
    "![Fitting a linear regression model to one-dimensional data.](../img/fit-linreg.svg)\n",
    ":label:`fig_fit_linreg`\n",
    "\n",
    "Note that large differences between\n",
    "estimates $\\hat{y}^{(i)}$ and targets $y^{(i)}$\n",
    "lead to even larger contributions to the loss,\n",
    "due to its quadratic form\n",
    "(this quadraticity can be a double-edge sword; while it encourages the model to avoid large errors\n",
    "it can also lead to excessive sensitivity to anomalous data).\n",
    "To measure the quality of a model on the entire dataset of $n$ examples,\n",
    "we simply average (or equivalently, sum)\n",
    "the losses on the training set:\n",
    "\n",
    "$$L(\\mathbf{w}, b) =\\frac{1}{n}\\sum_{i=1}^n l^{(i)}(\\mathbf{w}, b) =\\frac{1}{n} \\sum_{i=1}^n \\frac{1}{2}\\left(\\mathbf{w}^\\top \\mathbf{x}^{(i)} + b - y^{(i)}\\right)^2.$$\n",
    "\n",
    "When training the model, we seek parameters ($\\mathbf{w}^*, b^*$)\n",
    "that minimize the total loss across all training examples:\n",
    "\n",
    "$$\\mathbf{w}^*, b^* = \\operatorname*{argmin}_{\\mathbf{w}, b}\\  L(\\mathbf{w}, b).$$\n",
    "\n",
    "### Analytic Solution\n",
    "\n",
    "Unlike most of the models that we will cover,\n",
    "linear regression presents us with\n",
    "a surprisingly easy optimization problem.\n",
    "In particular, we can find the optimal parameters\n",
    "(as assessed on the training data)\n",
    "analytically by applying a simple formula as follows.\n",
    "First, we can subsume the bias $b$ into the parameter $\\mathbf{w}$\n",
    "by appending a column to the design matrix consisting of all 1s.\n",
    "Then our prediction problem is to minimize $\\|\\mathbf{y} - \\mathbf{X}\\mathbf{w}\\|^2$.\n",
    "As long as the design matrix $\\mathbf{X}$ has full rank\n",
    "(no feature is linearly dependent on the others),\n",
    "then there will be just one critical point on the loss surface\n",
    "and it corresponds to the minimum of the loss over the entire domain.\n",
    "Taking the derivative of the loss with respect to $\\mathbf{w}$\n",
    "and setting it equal to zero yields:\n",
    "\n",
    "$$\\begin{aligned}\n",
    "    \\partial_{\\mathbf{w}} \\|\\mathbf{y} - \\mathbf{X}\\mathbf{w}\\|^2 =\n",
    "    2 \\mathbf{X}^\\top (\\mathbf{X} \\mathbf{w} - \\mathbf{y}) = 0\n",
    "    \\textrm{ and hence }\n",
    "    \\mathbf{X}^\\top \\mathbf{y} = \\mathbf{X}^\\top \\mathbf{X} \\mathbf{w}.\n",
    "\\end{aligned}$$\n",
    "\n",
    "Solving for $\\mathbf{w}$ provides us with the optimal solution\n",
    "for the optimization problem.\n",
    "Note that this solution \n",
    "\n",
    "$$\\mathbf{w}^* = (\\mathbf X^\\top \\mathbf X)^{-1}\\mathbf X^\\top \\mathbf{y}$$\n",
    "\n",
    "will only be unique\n",
    "when the matrix $\\mathbf X^\\top \\mathbf X$ is invertible,\n",
    "i.e., when the columns of the design matrix\n",
    "are linearly independent :cite:`Golub.Van-Loan.1996`.\n",
    "\n",
    "\n",
    "\n",
    "While simple problems like linear regression\n",
    "may admit analytic solutions,\n",
    "you should not get used to such good fortune.\n",
    "Although analytic solutions allow for nice mathematical analysis,\n",
    "the requirement of an analytic solution is so restrictive\n",
    "that it would exclude almost all exciting aspects of deep learning.\n",
    "\n",
    "### Minibatch Stochastic Gradient Descent\n",
    "\n",
    "Fortunately, even in cases where we cannot solve the models analytically,\n",
    "we can still often train models effectively in practice.\n",
    "Moreover, for many tasks, those hard-to-optimize models\n",
    "turn out to be so much better that figuring out how to train them\n",
    "ends up being well worth the trouble.\n",
    "\n",
    "The key technique for optimizing nearly every deep learning model,\n",
    "and which we will call upon throughout this book,\n",
    "consists of iteratively reducing the error\n",
    "by updating the parameters in the direction\n",
    "that incrementally lowers the loss function.\n",
    "This algorithm is called *gradient descent*.\n",
    "\n",
    "The most naive application of gradient descent\n",
    "consists of taking the derivative of the loss function,\n",
    "which is an average of the losses computed\n",
    "on every single example in the dataset.\n",
    "In practice, this can be extremely slow:\n",
    "we must pass over the entire dataset before making a single update,\n",
    "even if the update steps might be very powerful :cite:`Liu.Nocedal.1989`.\n",
    "Even worse, if there is a lot of redundancy in the training data,\n",
    "the benefit of a full update is limited.\n",
    "\n",
    "The other extreme is to consider only a single example at a time and to take\n",
    "update steps based on one observation at a time.\n",
    "The resulting algorithm, *stochastic gradient descent* (SGD)\n",
    "can be an effective strategy :cite:`Bottou.2010`, even for large datasets.\n",
    "Unfortunately, SGD has drawbacks, both computational and statistical.\n",
    "One problem arises from the fact that processors are a lot faster\n",
    "multiplying and adding numbers than they are\n",
    "at moving data from main memory to processor cache.\n",
    "It is up to an order of magnitude more efficient to\n",
    "perform a matrix--vector multiplication\n",
    "than a corresponding number of vector--vector operations.\n",
    "This means that it can take a lot longer to process\n",
    "one sample at a time compared to a full batch.\n",
    "A second problem is that some of the layers,\n",
    "such as batch normalization (to be described in :numref:`sec_batch_norm`),\n",
    "only work well when we have access\n",
    "to more than one observation at a time.\n",
    "\n",
    "The solution to both problems is to pick an intermediate strategy:\n",
    "rather than taking a full batch or only a single sample at a time,\n",
    "we take a *minibatch* of observations :cite:`Li.Zhang.Chen.ea.2014`.\n",
    "The specific choice of the size of the said minibatch depends on many factors,\n",
    "such as the amount of memory, the number of accelerators,\n",
    "the choice of layers, and the total dataset size.\n",
    "Despite all that, a number between 32 and 256,\n",
    "preferably a multiple of a large power of $2$, is a good start.\n",
    "This leads us to *minibatch stochastic gradient descent*.\n",
    "\n",
    "In its most basic form, in each iteration $t$,\n",
    "we first randomly sample a minibatch $\\mathcal{B}_t$\n",
    "consisting of a fixed number $|\\mathcal{B}|$ of training examples.\n",
    "We then compute the derivative (gradient) of the average loss\n",
    "on the minibatch with respect to the model parameters.\n",
    "Finally, we multiply the gradient\n",
    "by a predetermined small positive value $\\eta$,\n",
    "called the *learning rate*,\n",
    "and subtract the resulting term from the current parameter values.\n",
    "We can express the update as follows:\n",
    "\n",
    "$$(\\mathbf{w},b) \\leftarrow (\\mathbf{w},b) - \\frac{\\eta}{|\\mathcal{B}|} \\sum_{i \\in \\mathcal{B}_t} \\partial_{(\\mathbf{w},b)} l^{(i)}(\\mathbf{w},b).$$\n",
    "\n",
    "In summary, minibatch SGD proceeds as follows:\n",
    "(i) initialize the values of the model parameters, typically at random;\n",
    "(ii) iteratively sample random minibatches from the data,\n",
    "updating the parameters in the direction of the negative gradient.\n",
    "For quadratic losses and affine transformations,\n",
    "this has a closed-form expansion:\n",
    "\n",
    "$$\\begin{aligned} \\mathbf{w} & \\leftarrow \\mathbf{w} - \\frac{\\eta}{|\\mathcal{B}|} \\sum_{i \\in \\mathcal{B}_t} \\partial_{\\mathbf{w}} l^{(i)}(\\mathbf{w}, b) && = \\mathbf{w} - \\frac{\\eta}{|\\mathcal{B}|} \\sum_{i \\in \\mathcal{B}_t} \\mathbf{x}^{(i)} \\left(\\mathbf{w}^\\top \\mathbf{x}^{(i)} + b - y^{(i)}\\right)\\\\ b &\\leftarrow b -  \\frac{\\eta}{|\\mathcal{B}|} \\sum_{i \\in \\mathcal{B}_t} \\partial_b l^{(i)}(\\mathbf{w}, b) &&  = b - \\frac{\\eta}{|\\mathcal{B}|} \\sum_{i \\in \\mathcal{B}_t} \\left(\\mathbf{w}^\\top \\mathbf{x}^{(i)} + b - y^{(i)}\\right). \\end{aligned}$$\n",
    ":eqlabel:`eq_linreg_batch_update`\n",
    "\n",
    "Since we pick a minibatch $\\mathcal{B}$\n",
    "we need to normalize by its size $|\\mathcal{B}|$.\n",
    "Frequently minibatch size and learning rate are user-defined.\n",
    "Such tunable parameters that are not updated\n",
    "in the training loop are called *hyperparameters*.\n",
    "They can be tuned automatically by a number of techniques, such as Bayesian optimization\n",
    ":cite:`Frazier.2018`. In the end, the quality of the solution is\n",
    "typically assessed on a separate *validation dataset* (or *validation set*).\n",
    "\n",
    "After training for some predetermined number of iterations\n",
    "(or until some other stopping criterion is met),\n",
    "we record the estimated model parameters,\n",
    "denoted $\\hat{\\mathbf{w}}, \\hat{b}$.\n",
    "Note that even if our function is truly linear and noiseless,\n",
    "these parameters will not be the exact minimizers of the loss, nor even deterministic.\n",
    "Although the algorithm converges slowly towards the minimizers\n",
    "it typically will not find them exactly in a finite number of steps.\n",
    "Moreover, the minibatches $\\mathcal{B}$\n",
    "used for updating the parameters are chosen at random.\n",
    "This breaks determinism.\n",
    "\n",
    "Linear regression happens to be a learning problem\n",
    "with a global minimum\n",
    "(whenever $\\mathbf{X}$ is full rank, or equivalently,\n",
    "whenever $\\mathbf{X}^\\top \\mathbf{X}$ is invertible).\n",
    "However, the loss surfaces for deep networks contain many saddle points and minima.\n",
    "Fortunately, we typically do not care about finding\n",
    "an exact set of parameters but merely any set of parameters\n",
    "that leads to accurate predictions (and thus low loss).\n",
    "In practice, deep learning practitioners\n",
    "seldom struggle to find parameters\n",
    "that minimize the loss *on training sets*\n",
    ":cite:`Izmailov.Podoprikhin.Garipov.ea.2018,Frankle.Carbin.2018`.\n",
    "The more formidable task is to find parameters\n",
    "that lead to accurate predictions on previously unseen data,\n",
    "a challenge called *generalization*.\n",
    "We return to these topics throughout the book.\n",
    "\n",
    "### Predictions\n",
    "\n",
    "Given the model $\\hat{\\mathbf{w}}^\\top \\mathbf{x} + \\hat{b}$,\n",
    "we can now make *predictions* for a new example,\n",
    "e.g., predicting the sales price of a previously unseen house\n",
    "given its area $x_1$ and age $x_2$.\n",
    "Deep learning practitioners have taken to calling the prediction phase *inference*\n",
    "but this is a bit of a misnomer---*inference* refers broadly\n",
    "to any conclusion reached on the basis of evidence,\n",
    "including both the values of the parameters\n",
    "and the likely label for an unseen instance.\n",
    "If anything, in the statistics literature\n",
    "*inference* more often denotes parameter inference\n",
    "and this overloading of terminology creates unnecessary confusion\n",
    "when deep learning practitioners talk to statisticians.\n",
    "In the following we will stick to *prediction* whenever possible.\n",
    "\n",
    "\n",
    "\n",
    "## Vectorization for Speed\n",
    "\n",
    "When training our models, we typically want to process\n",
    "whole minibatches of examples simultaneously.\n",
    "Doing this efficiently requires that (**we**) (~~should~~)\n",
    "(**vectorize the calculations and leverage\n",
    "fast linear algebra libraries\n",
    "rather than writing costly for-loops in Python.**)\n",
    "\n",
    "To see why this matters so much,\n",
    "let's (**consider two methods for adding vectors.**)\n",
    "To start, we instantiate two 10,000-dimensional vectors\n",
    "containing all 1s.\n",
    "In the first method, we loop over the vectors with a Python for-loop.\n",
    "In the second, we rely on a single call to `+`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "e31ed5b7",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:25.962060Z",
     "iopub.status.busy": "2023-08-18T19:40:25.961577Z",
     "iopub.status.idle": "2023-08-18T19:40:25.986556Z",
     "shell.execute_reply": "2023-08-18T19:40:25.985693Z"
    },
    "origin_pos": 7,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [],
   "source": [
    "n = 10000\n",
    "a = torch.ones(n)\n",
    "b = torch.ones(n)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5dd38b29",
   "metadata": {
    "origin_pos": 8
   },
   "source": [
    "Now we can benchmark the workloads.\n",
    "First, [**we add them, one coordinate at a time,\n",
    "using a for-loop.**]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ebf6b45f",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:25.991235Z",
     "iopub.status.busy": "2023-08-18T19:40:25.990635Z",
     "iopub.status.idle": "2023-08-18T19:40:26.178339Z",
     "shell.execute_reply": "2023-08-18T19:40:26.177087Z"
    },
    "origin_pos": 9,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'0.17802 sec'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c = torch.zeros(n)\n",
    "t = time.time()\n",
    "for i in range(n):\n",
    "    c[i] = a[i] + b[i]\n",
    "f'{time.time() - t:.5f} sec'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7e31f61",
   "metadata": {
    "origin_pos": 12
   },
   "source": [
    "(**Alternatively, we rely on the reloaded `+` operator to compute the elementwise sum.**)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "f23de63f",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:26.183124Z",
     "iopub.status.busy": "2023-08-18T19:40:26.182348Z",
     "iopub.status.idle": "2023-08-18T19:40:26.190223Z",
     "shell.execute_reply": "2023-08-18T19:40:26.189016Z"
    },
    "origin_pos": 13,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'0.00036 sec'"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "t = time.time()\n",
    "d = a + b\n",
    "f'{time.time() - t:.5f} sec'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59022167",
   "metadata": {
    "origin_pos": 14
   },
   "source": [
    "The second method is dramatically faster than the first.\n",
    "Vectorizing code often yields order-of-magnitude speedups.\n",
    "Moreover, we push more of the mathematics to the library\n",
    "so we do not have to write as many calculations ourselves,\n",
    "reducing the potential for errors and increasing portability of the code.\n",
    "\n",
    "\n",
    "## The Normal Distribution and Squared Loss\n",
    ":label:`subsec_normal_distribution_and_squared_loss`\n",
    "\n",
    "So far we have given a fairly functional motivation\n",
    "of the squared loss objective:\n",
    "the optimal parameters return the conditional expectation $E[Y\\mid X]$\n",
    "whenever the underlying pattern is truly linear,\n",
    "and the loss assigns large penalties for outliers.\n",
    "We can also provide a more formal motivation\n",
    "for the squared loss objective\n",
    "by making probabilistic assumptions\n",
    "about the distribution of noise.\n",
    "\n",
    "Linear regression was invented at the turn of the 19th century.\n",
    "While it has long been debated whether Gauss or Legendre\n",
    "first thought up the idea,\n",
    "it was Gauss who also discovered the normal distribution\n",
    "(also called the *Gaussian*).\n",
    "It turns out that the normal distribution\n",
    "and linear regression with squared loss\n",
    "share a deeper connection than common parentage.\n",
    "\n",
    "To begin, recall that a normal distribution\n",
    "with mean $\\mu$ and variance $\\sigma^2$ (standard deviation $\\sigma$)\n",
    "is given as\n",
    "\n",
    "$$p(x) = \\frac{1}{\\sqrt{2 \\pi \\sigma^2}} \\exp\\left(-\\frac{1}{2 \\sigma^2} (x - \\mu)^2\\right).$$\n",
    "\n",
    "Below [**we define a function to compute the normal distribution**].\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c1c4cb4d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:26.194623Z",
     "iopub.status.busy": "2023-08-18T19:40:26.193899Z",
     "iopub.status.idle": "2023-08-18T19:40:26.200266Z",
     "shell.execute_reply": "2023-08-18T19:40:26.199057Z"
    },
    "origin_pos": 15,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [],
   "source": [
    "def normal(x, mu, sigma):\n",
    "    p = 1 / math.sqrt(2 * math.pi * sigma**2)\n",
    "    return p * np.exp(-0.5 * (x - mu)**2 / sigma**2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8378d02",
   "metadata": {
    "origin_pos": 16
   },
   "source": [
    "We can now (**visualize the normal distributions**).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "32081aba",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:26.204882Z",
     "iopub.status.busy": "2023-08-18T19:40:26.203939Z",
     "iopub.status.idle": "2023-08-18T19:40:26.632371Z",
     "shell.execute_reply": "2023-08-18T19:40:26.631518Z"
    },
    "origin_pos": 18,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       "  \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<svg xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"302.08125pt\" height=\"183.35625pt\" viewBox=\"0 0 302.08125 183.35625\" xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">\n",
       " <metadata>\n",
       "  <rdf:RDF xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n",
       "   <cc:Work>\n",
       "    <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\n",
       "    <dc:date>2023-08-18T19:40:26.559135</dc:date>\n",
       "    <dc:format>image/svg+xml</dc:format>\n",
       "    <dc:creator>\n",
       "     <cc:Agent>\n",
       "      <dc:title>Matplotlib v3.7.2, https://matplotlib.org/</dc:title>\n",
       "     </cc:Agent>\n",
       "    </dc:creator>\n",
       "   </cc:Work>\n",
       "  </rdf:RDF>\n",
       " </metadata>\n",
       " <defs>\n",
       "  <style type=\"text/css\">*{stroke-linejoin: round; stroke-linecap: butt}</style>\n",
       " </defs>\n",
       " <g id=\"figure_1\">\n",
       "  <g id=\"patch_1\">\n",
       "   <path d=\"M 0 183.35625 \n",
       "L 302.08125 183.35625 \n",
       "L 302.08125 0 \n",
       "L 0 0 \n",
       "z\n",
       "\" style=\"fill: #ffffff\"/>\n",
       "  </g>\n",
       "  <g id=\"axes_1\">\n",
       "   <g id=\"patch_2\">\n",
       "    <path d=\"M 43.78125 145.8 \n",
       "L 294.88125 145.8 \n",
       "L 294.88125 7.2 \n",
       "L 43.78125 7.2 \n",
       "z\n",
       "\" style=\"fill: #ffffff\"/>\n",
       "   </g>\n",
       "   <g id=\"matplotlib.axis_1\">\n",
       "    <g id=\"xtick_1\">\n",
       "     <g id=\"line2d_1\">\n",
       "      <path d=\"M 71.511736 145.8 \n",
       "L 71.511736 7.2 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_2\">\n",
       "      <defs>\n",
       "       <path id=\"mb329bad76f\" d=\"M 0 0 \n",
       "L 0 3.5 \n",
       "\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </defs>\n",
       "      <g>\n",
       "       <use xlink:href=\"#mb329bad76f\" x=\"71.511736\" y=\"145.8\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_1\">\n",
       "      <!-- −6 -->\n",
       "      <g transform=\"translate(64.140642 160.398438) scale(0.1 -0.1)\">\n",
       "       <defs>\n",
       "        <path id=\"DejaVuSans-2212\" d=\"M 678 2272 \n",
       "L 4684 2272 \n",
       "L 4684 1741 \n",
       "L 678 1741 \n",
       "L 678 2272 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "        <path id=\"DejaVuSans-36\" d=\"M 2113 2584 \n",
       "Q 1688 2584 1439 2293 \n",
       "Q 1191 2003 1191 1497 \n",
       "Q 1191 994 1439 701 \n",
       "Q 1688 409 2113 409 \n",
       "Q 2538 409 2786 701 \n",
       "Q 3034 994 3034 1497 \n",
       "Q 3034 2003 2786 2293 \n",
       "Q 2538 2584 2113 2584 \n",
       "z\n",
       "M 3366 4563 \n",
       "L 3366 3988 \n",
       "Q 3128 4100 2886 4159 \n",
       "Q 2644 4219 2406 4219 \n",
       "Q 1781 4219 1451 3797 \n",
       "Q 1122 3375 1075 2522 \n",
       "Q 1259 2794 1537 2939 \n",
       "Q 1816 3084 2150 3084 \n",
       "Q 2853 3084 3261 2657 \n",
       "Q 3669 2231 3669 1497 \n",
       "Q 3669 778 3244 343 \n",
       "Q 2819 -91 2113 -91 \n",
       "Q 1303 -91 875 529 \n",
       "Q 447 1150 447 2328 \n",
       "Q 447 3434 972 4092 \n",
       "Q 1497 4750 2381 4750 \n",
       "Q 2619 4750 2861 4703 \n",
       "Q 3103 4656 3366 4563 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       </defs>\n",
       "       <use xlink:href=\"#DejaVuSans-2212\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-36\" x=\"83.789062\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_2\">\n",
       "     <g id=\"line2d_3\">\n",
       "      <path d=\"M 104.145435 145.8 \n",
       "L 104.145435 7.2 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_4\">\n",
       "      <g>\n",
       "       <use xlink:href=\"#mb329bad76f\" x=\"104.145435\" y=\"145.8\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_2\">\n",
       "      <!-- −4 -->\n",
       "      <g transform=\"translate(96.774342 160.398438) scale(0.1 -0.1)\">\n",
       "       <defs>\n",
       "        <path id=\"DejaVuSans-34\" d=\"M 2419 4116 \n",
       "L 825 1625 \n",
       "L 2419 1625 \n",
       "L 2419 4116 \n",
       "z\n",
       "M 2253 4666 \n",
       "L 3047 4666 \n",
       "L 3047 1625 \n",
       "L 3713 1625 \n",
       "L 3713 1100 \n",
       "L 3047 1100 \n",
       "L 3047 0 \n",
       "L 2419 0 \n",
       "L 2419 1100 \n",
       "L 313 1100 \n",
       "L 313 1709 \n",
       "L 2253 4666 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       </defs>\n",
       "       <use xlink:href=\"#DejaVuSans-2212\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-34\" x=\"83.789062\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_3\">\n",
       "     <g id=\"line2d_5\">\n",
       "      <path d=\"M 136.779135 145.8 \n",
       "L 136.779135 7.2 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_6\">\n",
       "      <g>\n",
       "       <use xlink:href=\"#mb329bad76f\" x=\"136.779135\" y=\"145.8\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_3\">\n",
       "      <!-- −2 -->\n",
       "      <g transform=\"translate(129.408041 160.398438) scale(0.1 -0.1)\">\n",
       "       <defs>\n",
       "        <path id=\"DejaVuSans-32\" d=\"M 1228 531 \n",
       "L 3431 531 \n",
       "L 3431 0 \n",
       "L 469 0 \n",
       "L 469 531 \n",
       "Q 828 903 1448 1529 \n",
       "Q 2069 2156 2228 2338 \n",
       "Q 2531 2678 2651 2914 \n",
       "Q 2772 3150 2772 3378 \n",
       "Q 2772 3750 2511 3984 \n",
       "Q 2250 4219 1831 4219 \n",
       "Q 1534 4219 1204 4116 \n",
       "Q 875 4013 500 3803 \n",
       "L 500 4441 \n",
       "Q 881 4594 1212 4672 \n",
       "Q 1544 4750 1819 4750 \n",
       "Q 2544 4750 2975 4387 \n",
       "Q 3406 4025 3406 3419 \n",
       "Q 3406 3131 3298 2873 \n",
       "Q 3191 2616 2906 2266 \n",
       "Q 2828 2175 2409 1742 \n",
       "Q 1991 1309 1228 531 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       </defs>\n",
       "       <use xlink:href=\"#DejaVuSans-2212\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-32\" x=\"83.789062\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_4\">\n",
       "     <g id=\"line2d_7\">\n",
       "      <path d=\"M 169.412834 145.8 \n",
       "L 169.412834 7.2 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_8\">\n",
       "      <g>\n",
       "       <use xlink:href=\"#mb329bad76f\" x=\"169.412834\" y=\"145.8\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_4\">\n",
       "      <!-- 0 -->\n",
       "      <g transform=\"translate(166.231584 160.398438) scale(0.1 -0.1)\">\n",
       "       <defs>\n",
       "        <path id=\"DejaVuSans-30\" d=\"M 2034 4250 \n",
       "Q 1547 4250 1301 3770 \n",
       "Q 1056 3291 1056 2328 \n",
       "Q 1056 1369 1301 889 \n",
       "Q 1547 409 2034 409 \n",
       "Q 2525 409 2770 889 \n",
       "Q 3016 1369 3016 2328 \n",
       "Q 3016 3291 2770 3770 \n",
       "Q 2525 4250 2034 4250 \n",
       "z\n",
       "M 2034 4750 \n",
       "Q 2819 4750 3233 4129 \n",
       "Q 3647 3509 3647 2328 \n",
       "Q 3647 1150 3233 529 \n",
       "Q 2819 -91 2034 -91 \n",
       "Q 1250 -91 836 529 \n",
       "Q 422 1150 422 2328 \n",
       "Q 422 3509 836 4129 \n",
       "Q 1250 4750 2034 4750 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       </defs>\n",
       "       <use xlink:href=\"#DejaVuSans-30\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_5\">\n",
       "     <g id=\"line2d_9\">\n",
       "      <path d=\"M 202.046534 145.8 \n",
       "L 202.046534 7.2 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_10\">\n",
       "      <g>\n",
       "       <use xlink:href=\"#mb329bad76f\" x=\"202.046534\" y=\"145.8\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_5\">\n",
       "      <!-- 2 -->\n",
       "      <g transform=\"translate(198.865284 160.398438) scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-32\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_6\">\n",
       "     <g id=\"line2d_11\">\n",
       "      <path d=\"M 234.680233 145.8 \n",
       "L 234.680233 7.2 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_12\">\n",
       "      <g>\n",
       "       <use xlink:href=\"#mb329bad76f\" x=\"234.680233\" y=\"145.8\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_6\">\n",
       "      <!-- 4 -->\n",
       "      <g transform=\"translate(231.498983 160.398438) scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-34\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_7\">\n",
       "     <g id=\"line2d_13\">\n",
       "      <path d=\"M 267.313932 145.8 \n",
       "L 267.313932 7.2 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_14\">\n",
       "      <g>\n",
       "       <use xlink:href=\"#mb329bad76f\" x=\"267.313932\" y=\"145.8\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_7\">\n",
       "      <!-- 6 -->\n",
       "      <g transform=\"translate(264.132682 160.398438) scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-36\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"text_8\">\n",
       "     <!-- x -->\n",
       "     <g transform=\"translate(166.371875 174.076563) scale(0.1 -0.1)\">\n",
       "      <defs>\n",
       "       <path id=\"DejaVuSans-78\" d=\"M 3513 3500 \n",
       "L 2247 1797 \n",
       "L 3578 0 \n",
       "L 2900 0 \n",
       "L 1881 1375 \n",
       "L 863 0 \n",
       "L 184 0 \n",
       "L 1544 1831 \n",
       "L 300 3500 \n",
       "L 978 3500 \n",
       "L 1906 2253 \n",
       "L 2834 3500 \n",
       "L 3513 3500 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "      </defs>\n",
       "      <use xlink:href=\"#DejaVuSans-78\"/>\n",
       "     </g>\n",
       "    </g>\n",
       "   </g>\n",
       "   <g id=\"matplotlib.axis_2\">\n",
       "    <g id=\"ytick_1\">\n",
       "     <g id=\"line2d_15\">\n",
       "      <path d=\"M 43.78125 139.5 \n",
       "L 294.88125 139.5 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_16\">\n",
       "      <defs>\n",
       "       <path id=\"me8e75e7cba\" d=\"M 0 0 \n",
       "L -3.5 0 \n",
       "\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </defs>\n",
       "      <g>\n",
       "       <use xlink:href=\"#me8e75e7cba\" x=\"43.78125\" y=\"139.5\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_9\">\n",
       "      <!-- 0.0 -->\n",
       "      <g transform=\"translate(20.878125 143.299219) scale(0.1 -0.1)\">\n",
       "       <defs>\n",
       "        <path id=\"DejaVuSans-2e\" d=\"M 684 794 \n",
       "L 1344 794 \n",
       "L 1344 0 \n",
       "L 684 0 \n",
       "L 684 794 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       </defs>\n",
       "       <use xlink:href=\"#DejaVuSans-30\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"ytick_2\">\n",
       "     <g id=\"line2d_17\">\n",
       "      <path d=\"M 43.78125 107.916484 \n",
       "L 294.88125 107.916484 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_18\">\n",
       "      <g>\n",
       "       <use xlink:href=\"#me8e75e7cba\" x=\"43.78125\" y=\"107.916484\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_10\">\n",
       "      <!-- 0.1 -->\n",
       "      <g transform=\"translate(20.878125 111.715702) scale(0.1 -0.1)\">\n",
       "       <defs>\n",
       "        <path id=\"DejaVuSans-31\" d=\"M 794 531 \n",
       "L 1825 531 \n",
       "L 1825 4091 \n",
       "L 703 3866 \n",
       "L 703 4441 \n",
       "L 1819 4666 \n",
       "L 2450 4666 \n",
       "L 2450 531 \n",
       "L 3481 531 \n",
       "L 3481 0 \n",
       "L 794 0 \n",
       "L 794 531 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       </defs>\n",
       "       <use xlink:href=\"#DejaVuSans-30\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-31\" x=\"95.410156\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"ytick_3\">\n",
       "     <g id=\"line2d_19\">\n",
       "      <path d=\"M 43.78125 76.332967 \n",
       "L 294.88125 76.332967 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_20\">\n",
       "      <g>\n",
       "       <use xlink:href=\"#me8e75e7cba\" x=\"43.78125\" y=\"76.332967\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_11\">\n",
       "      <!-- 0.2 -->\n",
       "      <g transform=\"translate(20.878125 80.132186) scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-30\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-32\" x=\"95.410156\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"ytick_4\">\n",
       "     <g id=\"line2d_21\">\n",
       "      <path d=\"M 43.78125 44.749451 \n",
       "L 294.88125 44.749451 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_22\">\n",
       "      <g>\n",
       "       <use xlink:href=\"#me8e75e7cba\" x=\"43.78125\" y=\"44.749451\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_12\">\n",
       "      <!-- 0.3 -->\n",
       "      <g transform=\"translate(20.878125 48.54867) scale(0.1 -0.1)\">\n",
       "       <defs>\n",
       "        <path id=\"DejaVuSans-33\" d=\"M 2597 2516 \n",
       "Q 3050 2419 3304 2112 \n",
       "Q 3559 1806 3559 1356 \n",
       "Q 3559 666 3084 287 \n",
       "Q 2609 -91 1734 -91 \n",
       "Q 1441 -91 1130 -33 \n",
       "Q 819 25 488 141 \n",
       "L 488 750 \n",
       "Q 750 597 1062 519 \n",
       "Q 1375 441 1716 441 \n",
       "Q 2309 441 2620 675 \n",
       "Q 2931 909 2931 1356 \n",
       "Q 2931 1769 2642 2001 \n",
       "Q 2353 2234 1838 2234 \n",
       "L 1294 2234 \n",
       "L 1294 2753 \n",
       "L 1863 2753 \n",
       "Q 2328 2753 2575 2939 \n",
       "Q 2822 3125 2822 3475 \n",
       "Q 2822 3834 2567 4026 \n",
       "Q 2313 4219 1838 4219 \n",
       "Q 1578 4219 1281 4162 \n",
       "Q 984 4106 628 3988 \n",
       "L 628 4550 \n",
       "Q 988 4650 1302 4700 \n",
       "Q 1616 4750 1894 4750 \n",
       "Q 2613 4750 3031 4423 \n",
       "Q 3450 4097 3450 3541 \n",
       "Q 3450 3153 3228 2886 \n",
       "Q 3006 2619 2597 2516 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       </defs>\n",
       "       <use xlink:href=\"#DejaVuSans-30\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-33\" x=\"95.410156\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"ytick_5\">\n",
       "     <g id=\"line2d_23\">\n",
       "      <path d=\"M 43.78125 13.165935 \n",
       "L 294.88125 13.165935 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_24\">\n",
       "      <g>\n",
       "       <use xlink:href=\"#me8e75e7cba\" x=\"43.78125\" y=\"13.165935\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "     <g id=\"text_13\">\n",
       "      <!-- 0.4 -->\n",
       "      <g transform=\"translate(20.878125 16.965154) scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-30\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
       "       <use xlink:href=\"#DejaVuSans-34\" x=\"95.410156\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"text_14\">\n",
       "     <!-- p(x) -->\n",
       "     <g transform=\"translate(14.798438 86.535156) rotate(-90) scale(0.1 -0.1)\">\n",
       "      <defs>\n",
       "       <path id=\"DejaVuSans-70\" d=\"M 1159 525 \n",
       "L 1159 -1331 \n",
       "L 581 -1331 \n",
       "L 581 3500 \n",
       "L 1159 3500 \n",
       "L 1159 2969 \n",
       "Q 1341 3281 1617 3432 \n",
       "Q 1894 3584 2278 3584 \n",
       "Q 2916 3584 3314 3078 \n",
       "Q 3713 2572 3713 1747 \n",
       "Q 3713 922 3314 415 \n",
       "Q 2916 -91 2278 -91 \n",
       "Q 1894 -91 1617 61 \n",
       "Q 1341 213 1159 525 \n",
       "z\n",
       "M 3116 1747 \n",
       "Q 3116 2381 2855 2742 \n",
       "Q 2594 3103 2138 3103 \n",
       "Q 1681 3103 1420 2742 \n",
       "Q 1159 2381 1159 1747 \n",
       "Q 1159 1113 1420 752 \n",
       "Q 1681 391 2138 391 \n",
       "Q 2594 391 2855 752 \n",
       "Q 3116 1113 3116 1747 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       <path id=\"DejaVuSans-28\" d=\"M 1984 4856 \n",
       "Q 1566 4138 1362 3434 \n",
       "Q 1159 2731 1159 2009 \n",
       "Q 1159 1288 1364 580 \n",
       "Q 1569 -128 1984 -844 \n",
       "L 1484 -844 \n",
       "Q 1016 -109 783 600 \n",
       "Q 550 1309 550 2009 \n",
       "Q 550 2706 781 3412 \n",
       "Q 1013 4119 1484 4856 \n",
       "L 1984 4856 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       <path id=\"DejaVuSans-29\" d=\"M 513 4856 \n",
       "L 1013 4856 \n",
       "Q 1481 4119 1714 3412 \n",
       "Q 1947 2706 1947 2009 \n",
       "Q 1947 1309 1714 600 \n",
       "Q 1481 -109 1013 -844 \n",
       "L 513 -844 \n",
       "Q 928 -128 1133 580 \n",
       "Q 1338 1288 1338 2009 \n",
       "Q 1338 2731 1133 3434 \n",
       "Q 928 4138 513 4856 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "      </defs>\n",
       "      <use xlink:href=\"#DejaVuSans-70\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-28\" x=\"63.476562\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-78\" x=\"102.490234\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-29\" x=\"161.669922\"/>\n",
       "     </g>\n",
       "    </g>\n",
       "   </g>\n",
       "   <g id=\"line2d_25\">\n",
       "    <path d=\"M 55.194886 139.5 \n",
       "L 108.061479 139.392742 \n",
       "L 113.44604 139.148719 \n",
       "L 117.035747 138.770771 \n",
       "L 119.809611 138.25954 \n",
       "L 122.09397 137.619981 \n",
       "L 124.051992 136.856558 \n",
       "L 125.846846 135.932618 \n",
       "L 127.478531 134.864093 \n",
       "L 129.110215 133.53546 \n",
       "L 130.7419 131.902404 \n",
       "L 132.373585 129.918522 \n",
       "L 133.842102 127.79423 \n",
       "L 135.473787 125.01546 \n",
       "L 137.105472 121.755388 \n",
       "L 138.737157 117.977866 \n",
       "L 140.368842 113.655911 \n",
       "L 142.163695 108.255858 \n",
       "L 144.121717 101.596938 \n",
       "L 146.242908 93.525729 \n",
       "L 148.690435 83.248545 \n",
       "L 151.790637 69.178196 \n",
       "L 157.827871 41.57206 \n",
       "L 159.949061 33.006718 \n",
       "L 161.743915 26.675681 \n",
       "L 163.212431 22.276554 \n",
       "L 164.517779 19.044317 \n",
       "L 165.659959 16.789011 \n",
       "L 166.63897 15.307609 \n",
       "L 167.454812 14.403942 \n",
       "L 168.270655 13.808322 \n",
       "L 168.923329 13.556687 \n",
       "L 169.576003 13.5063 \n",
       "L 170.228677 13.657402 \n",
       "L 170.881351 14.009268 \n",
       "L 171.697193 14.728769 \n",
       "L 172.513036 15.753897 \n",
       "L 173.492047 17.376612 \n",
       "L 174.634226 19.788832 \n",
       "L 175.776406 22.726997 \n",
       "L 177.081754 26.675681 \n",
       "L 178.713439 32.392617 \n",
       "L 180.671461 40.191426 \n",
       "L 183.118988 50.957567 \n",
       "L 192.256424 92.210802 \n",
       "L 194.540783 101.006788 \n",
       "L 196.498805 107.731288 \n",
       "L 198.456827 113.655911 \n",
       "L 200.25168 118.379758 \n",
       "L 201.883365 122.104146 \n",
       "L 203.51505 125.314336 \n",
       "L 205.146735 128.047227 \n",
       "L 206.77842 130.345625 \n",
       "L 208.410105 132.255577 \n",
       "L 210.04179 133.824084 \n",
       "L 211.673475 135.097239 \n",
       "L 213.468328 136.208702 \n",
       "L 215.42635 137.136646 \n",
       "L 217.547541 137.875773 \n",
       "L 219.8319 138.435791 \n",
       "L 222.605764 138.879679 \n",
       "L 226.032303 139.193999 \n",
       "L 230.927358 139.396705 \n",
       "L 239.412119 139.487295 \n",
       "L 277.104042 139.5 \n",
       "L 283.467614 139.5 \n",
       "L 283.467614 139.5 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
       "   </g>\n",
       "   <g id=\"line2d_26\">\n",
       "    <path d=\"M 55.194886 139.362188 \n",
       "L 65.63767 139.098708 \n",
       "L 72.980253 138.699787 \n",
       "L 78.69115 138.178279 \n",
       "L 83.586205 137.516853 \n",
       "L 87.991754 136.697191 \n",
       "L 91.907798 135.745999 \n",
       "L 95.497505 134.654637 \n",
       "L 98.924044 133.387507 \n",
       "L 102.187413 131.951607 \n",
       "L 105.450783 130.271147 \n",
       "L 108.714153 128.328848 \n",
       "L 111.977523 126.112322 \n",
       "L 115.240893 123.615644 \n",
       "L 118.504263 120.840886 \n",
       "L 121.930802 117.640793 \n",
       "L 125.683677 113.829699 \n",
       "L 130.089226 109.01843 \n",
       "L 136.126461 102.052696 \n",
       "L 144.448054 92.482389 \n",
       "L 148.364098 88.331596 \n",
       "L 151.627468 85.194678 \n",
       "L 154.401333 82.824913 \n",
       "L 157.012028 80.888277 \n",
       "L 159.296387 79.455574 \n",
       "L 161.580746 78.288522 \n",
       "L 163.701937 77.457339 \n",
       "L 165.659959 76.915213 \n",
       "L 167.617981 76.595215 \n",
       "L 169.576003 76.500787 \n",
       "L 171.534025 76.632947 \n",
       "L 173.492047 76.99027 \n",
       "L 175.450069 77.568916 \n",
       "L 177.571259 78.438306 \n",
       "L 179.69245 79.54932 \n",
       "L 181.976809 81.000265 \n",
       "L 184.424336 82.824913 \n",
       "L 187.035032 85.047175 \n",
       "L 189.972065 87.839952 \n",
       "L 193.398603 91.412502 \n",
       "L 197.804153 96.3499 \n",
       "L 212.978823 113.657453 \n",
       "L 216.894867 117.640793 \n",
       "L 220.484574 120.986092 \n",
       "L 223.911112 123.877929 \n",
       "L 227.174482 126.346542 \n",
       "L 230.437852 128.535259 \n",
       "L 233.701222 130.450724 \n",
       "L 236.964592 132.105883 \n",
       "L 240.227962 133.51842 \n",
       "L 243.6545 134.763387 \n",
       "L 247.244207 135.834289 \n",
       "L 251.160251 136.766392 \n",
       "L 255.402632 137.542786 \n",
       "L 260.134519 138.178279 \n",
       "L 265.682247 138.687887 \n",
       "L 272.372156 139.065638 \n",
       "L 281.020086 139.318239 \n",
       "L 283.467614 139.359757 \n",
       "L 283.467614 139.359757 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
       "   </g>\n",
       "   <g id=\"line2d_27\">\n",
       "    <path d=\"M 55.194886 139.5 \n",
       "L 157.012028 139.392742 \n",
       "L 162.396589 139.148719 \n",
       "L 165.986296 138.770771 \n",
       "L 168.76016 138.25954 \n",
       "L 171.044519 137.619981 \n",
       "L 173.002541 136.856558 \n",
       "L 174.797395 135.932618 \n",
       "L 176.42908 134.864093 \n",
       "L 178.060765 133.53546 \n",
       "L 179.69245 131.902404 \n",
       "L 181.324135 129.918522 \n",
       "L 182.792651 127.79423 \n",
       "L 184.424336 125.01546 \n",
       "L 186.056021 121.755388 \n",
       "L 187.687706 117.977866 \n",
       "L 189.319391 113.655911 \n",
       "L 191.114244 108.255858 \n",
       "L 193.072266 101.596938 \n",
       "L 195.193457 93.525729 \n",
       "L 197.640984 83.248545 \n",
       "L 200.741186 69.178196 \n",
       "L 206.77842 41.57206 \n",
       "L 208.899611 33.006718 \n",
       "L 210.694464 26.675681 \n",
       "L 212.16298 22.276554 \n",
       "L 213.468328 19.044317 \n",
       "L 214.610508 16.789011 \n",
       "L 215.589519 15.307609 \n",
       "L 216.405361 14.403942 \n",
       "L 217.221204 13.808322 \n",
       "L 217.873878 13.556687 \n",
       "L 218.526552 13.5063 \n",
       "L 219.179226 13.657402 \n",
       "L 219.8319 14.009268 \n",
       "L 220.647742 14.728769 \n",
       "L 221.463585 15.753897 \n",
       "L 222.442596 17.376612 \n",
       "L 223.584775 19.788832 \n",
       "L 224.726955 22.726997 \n",
       "L 226.032303 26.675681 \n",
       "L 227.663988 32.392617 \n",
       "L 229.62201 40.191426 \n",
       "L 232.069537 50.957567 \n",
       "L 241.206973 92.210802 \n",
       "L 243.491332 101.006788 \n",
       "L 245.449354 107.731288 \n",
       "L 247.407376 113.655911 \n",
       "L 249.202229 118.379758 \n",
       "L 250.833914 122.104146 \n",
       "L 252.465599 125.314336 \n",
       "L 254.097284 128.047227 \n",
       "L 255.728969 130.345625 \n",
       "L 257.360654 132.255577 \n",
       "L 258.992339 133.824084 \n",
       "L 260.624024 135.097239 \n",
       "L 262.418878 136.208702 \n",
       "L 264.376899 137.136646 \n",
       "L 266.49809 137.875773 \n",
       "L 268.782449 138.435791 \n",
       "L 271.556313 138.879679 \n",
       "L 274.982852 139.193999 \n",
       "L 279.877907 139.396705 \n",
       "L 283.467614 139.456009 \n",
       "L 283.467614 139.456009 \n",
       "\" clip-path=\"url(#pd34fc29185)\" style=\"fill: none; stroke-dasharray: 9.6,2.4,1.5,2.4; stroke-dashoffset: 0; stroke: #008000; stroke-width: 1.5\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_3\">\n",
       "    <path d=\"M 43.78125 145.8 \n",
       "L 43.78125 7.2 \n",
       "\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_4\">\n",
       "    <path d=\"M 294.88125 145.8 \n",
       "L 294.88125 7.2 \n",
       "\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_5\">\n",
       "    <path d=\"M 43.78125 145.8 \n",
       "L 294.88125 145.8 \n",
       "\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_6\">\n",
       "    <path d=\"M 43.78125 7.2 \n",
       "L 294.88125 7.2 \n",
       "\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
       "   </g>\n",
       "   <g id=\"legend_1\">\n",
       "    <g id=\"patch_7\">\n",
       "     <path d=\"M 50.78125 59.234375 \n",
       "L 152.05625 59.234375 \n",
       "Q 154.05625 59.234375 154.05625 57.234375 \n",
       "L 154.05625 14.2 \n",
       "Q 154.05625 12.2 152.05625 12.2 \n",
       "L 50.78125 12.2 \n",
       "Q 48.78125 12.2 48.78125 14.2 \n",
       "L 48.78125 57.234375 \n",
       "Q 48.78125 59.234375 50.78125 59.234375 \n",
       "z\n",
       "\" style=\"fill: #ffffff; opacity: 0.8; stroke: #cccccc; stroke-linejoin: miter\"/>\n",
       "    </g>\n",
       "    <g id=\"line2d_28\">\n",
       "     <path d=\"M 52.78125 20.298438 \n",
       "L 62.78125 20.298438 \n",
       "L 72.78125 20.298438 \n",
       "\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
       "    </g>\n",
       "    <g id=\"text_15\">\n",
       "     <!-- mean 0, std 1 -->\n",
       "     <g transform=\"translate(80.78125 23.798438) scale(0.1 -0.1)\">\n",
       "      <defs>\n",
       "       <path id=\"DejaVuSans-6d\" d=\"M 3328 2828 \n",
       "Q 3544 3216 3844 3400 \n",
       "Q 4144 3584 4550 3584 \n",
       "Q 5097 3584 5394 3201 \n",
       "Q 5691 2819 5691 2113 \n",
       "L 5691 0 \n",
       "L 5113 0 \n",
       "L 5113 2094 \n",
       "Q 5113 2597 4934 2840 \n",
       "Q 4756 3084 4391 3084 \n",
       "Q 3944 3084 3684 2787 \n",
       "Q 3425 2491 3425 1978 \n",
       "L 3425 0 \n",
       "L 2847 0 \n",
       "L 2847 2094 \n",
       "Q 2847 2600 2669 2842 \n",
       "Q 2491 3084 2119 3084 \n",
       "Q 1678 3084 1418 2786 \n",
       "Q 1159 2488 1159 1978 \n",
       "L 1159 0 \n",
       "L 581 0 \n",
       "L 581 3500 \n",
       "L 1159 3500 \n",
       "L 1159 2956 \n",
       "Q 1356 3278 1631 3431 \n",
       "Q 1906 3584 2284 3584 \n",
       "Q 2666 3584 2933 3390 \n",
       "Q 3200 3197 3328 2828 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       <path id=\"DejaVuSans-65\" d=\"M 3597 1894 \n",
       "L 3597 1613 \n",
       "L 953 1613 \n",
       "Q 991 1019 1311 708 \n",
       "Q 1631 397 2203 397 \n",
       "Q 2534 397 2845 478 \n",
       "Q 3156 559 3463 722 \n",
       "L 3463 178 \n",
       "Q 3153 47 2828 -22 \n",
       "Q 2503 -91 2169 -91 \n",
       "Q 1331 -91 842 396 \n",
       "Q 353 884 353 1716 \n",
       "Q 353 2575 817 3079 \n",
       "Q 1281 3584 2069 3584 \n",
       "Q 2775 3584 3186 3129 \n",
       "Q 3597 2675 3597 1894 \n",
       "z\n",
       "M 3022 2063 \n",
       "Q 3016 2534 2758 2815 \n",
       "Q 2500 3097 2075 3097 \n",
       "Q 1594 3097 1305 2825 \n",
       "Q 1016 2553 972 2059 \n",
       "L 3022 2063 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       <path id=\"DejaVuSans-61\" d=\"M 2194 1759 \n",
       "Q 1497 1759 1228 1600 \n",
       "Q 959 1441 959 1056 \n",
       "Q 959 750 1161 570 \n",
       "Q 1363 391 1709 391 \n",
       "Q 2188 391 2477 730 \n",
       "Q 2766 1069 2766 1631 \n",
       "L 2766 1759 \n",
       "L 2194 1759 \n",
       "z\n",
       "M 3341 1997 \n",
       "L 3341 0 \n",
       "L 2766 0 \n",
       "L 2766 531 \n",
       "Q 2569 213 2275 61 \n",
       "Q 1981 -91 1556 -91 \n",
       "Q 1019 -91 701 211 \n",
       "Q 384 513 384 1019 \n",
       "Q 384 1609 779 1909 \n",
       "Q 1175 2209 1959 2209 \n",
       "L 2766 2209 \n",
       "L 2766 2266 \n",
       "Q 2766 2663 2505 2880 \n",
       "Q 2244 3097 1772 3097 \n",
       "Q 1472 3097 1187 3025 \n",
       "Q 903 2953 641 2809 \n",
       "L 641 3341 \n",
       "Q 956 3463 1253 3523 \n",
       "Q 1550 3584 1831 3584 \n",
       "Q 2591 3584 2966 3190 \n",
       "Q 3341 2797 3341 1997 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       <path id=\"DejaVuSans-6e\" d=\"M 3513 2113 \n",
       "L 3513 0 \n",
       "L 2938 0 \n",
       "L 2938 2094 \n",
       "Q 2938 2591 2744 2837 \n",
       "Q 2550 3084 2163 3084 \n",
       "Q 1697 3084 1428 2787 \n",
       "Q 1159 2491 1159 1978 \n",
       "L 1159 0 \n",
       "L 581 0 \n",
       "L 581 3500 \n",
       "L 1159 3500 \n",
       "L 1159 2956 \n",
       "Q 1366 3272 1645 3428 \n",
       "Q 1925 3584 2291 3584 \n",
       "Q 2894 3584 3203 3211 \n",
       "Q 3513 2838 3513 2113 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       <path id=\"DejaVuSans-20\" transform=\"scale(0.015625)\"/>\n",
       "       <path id=\"DejaVuSans-2c\" d=\"M 750 794 \n",
       "L 1409 794 \n",
       "L 1409 256 \n",
       "L 897 -744 \n",
       "L 494 -744 \n",
       "L 750 256 \n",
       "L 750 794 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       <path id=\"DejaVuSans-73\" d=\"M 2834 3397 \n",
       "L 2834 2853 \n",
       "Q 2591 2978 2328 3040 \n",
       "Q 2066 3103 1784 3103 \n",
       "Q 1356 3103 1142 2972 \n",
       "Q 928 2841 928 2578 \n",
       "Q 928 2378 1081 2264 \n",
       "Q 1234 2150 1697 2047 \n",
       "L 1894 2003 \n",
       "Q 2506 1872 2764 1633 \n",
       "Q 3022 1394 3022 966 \n",
       "Q 3022 478 2636 193 \n",
       "Q 2250 -91 1575 -91 \n",
       "Q 1294 -91 989 -36 \n",
       "Q 684 19 347 128 \n",
       "L 347 722 \n",
       "Q 666 556 975 473 \n",
       "Q 1284 391 1588 391 \n",
       "Q 1994 391 2212 530 \n",
       "Q 2431 669 2431 922 \n",
       "Q 2431 1156 2273 1281 \n",
       "Q 2116 1406 1581 1522 \n",
       "L 1381 1569 \n",
       "Q 847 1681 609 1914 \n",
       "Q 372 2147 372 2553 \n",
       "Q 372 3047 722 3315 \n",
       "Q 1072 3584 1716 3584 \n",
       "Q 2034 3584 2315 3537 \n",
       "Q 2597 3491 2834 3397 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       <path id=\"DejaVuSans-74\" d=\"M 1172 4494 \n",
       "L 1172 3500 \n",
       "L 2356 3500 \n",
       "L 2356 3053 \n",
       "L 1172 3053 \n",
       "L 1172 1153 \n",
       "Q 1172 725 1289 603 \n",
       "Q 1406 481 1766 481 \n",
       "L 2356 481 \n",
       "L 2356 0 \n",
       "L 1766 0 \n",
       "Q 1100 0 847 248 \n",
       "Q 594 497 594 1153 \n",
       "L 594 3053 \n",
       "L 172 3053 \n",
       "L 172 3500 \n",
       "L 594 3500 \n",
       "L 594 4494 \n",
       "L 1172 4494 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "       <path id=\"DejaVuSans-64\" d=\"M 2906 2969 \n",
       "L 2906 4863 \n",
       "L 3481 4863 \n",
       "L 3481 0 \n",
       "L 2906 0 \n",
       "L 2906 525 \n",
       "Q 2725 213 2448 61 \n",
       "Q 2172 -91 1784 -91 \n",
       "Q 1150 -91 751 415 \n",
       "Q 353 922 353 1747 \n",
       "Q 353 2572 751 3078 \n",
       "Q 1150 3584 1784 3584 \n",
       "Q 2172 3584 2448 3432 \n",
       "Q 2725 3281 2906 2969 \n",
       "z\n",
       "M 947 1747 \n",
       "Q 947 1113 1208 752 \n",
       "Q 1469 391 1925 391 \n",
       "Q 2381 391 2643 752 \n",
       "Q 2906 1113 2906 1747 \n",
       "Q 2906 2381 2643 2742 \n",
       "Q 2381 3103 1925 3103 \n",
       "Q 1469 3103 1208 2742 \n",
       "Q 947 2381 947 1747 \n",
       "z\n",
       "\" transform=\"scale(0.015625)\"/>\n",
       "      </defs>\n",
       "      <use xlink:href=\"#DejaVuSans-6d\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-65\" x=\"97.412109\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-61\" x=\"158.935547\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-6e\" x=\"220.214844\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-20\" x=\"283.59375\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-30\" x=\"315.380859\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-2c\" x=\"379.003906\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-20\" x=\"410.791016\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-73\" x=\"442.578125\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-74\" x=\"494.677734\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-64\" x=\"533.886719\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-20\" x=\"597.363281\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-31\" x=\"629.150391\"/>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"line2d_29\">\n",
       "     <path d=\"M 52.78125 34.976563 \n",
       "L 62.78125 34.976563 \n",
       "L 72.78125 34.976563 \n",
       "\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
       "    </g>\n",
       "    <g id=\"text_16\">\n",
       "     <!-- mean 0, std 2 -->\n",
       "     <g transform=\"translate(80.78125 38.476563) scale(0.1 -0.1)\">\n",
       "      <use xlink:href=\"#DejaVuSans-6d\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-65\" x=\"97.412109\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-61\" x=\"158.935547\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-6e\" x=\"220.214844\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-20\" x=\"283.59375\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-30\" x=\"315.380859\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-2c\" x=\"379.003906\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-20\" x=\"410.791016\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-73\" x=\"442.578125\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-74\" x=\"494.677734\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-64\" x=\"533.886719\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-20\" x=\"597.363281\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-32\" x=\"629.150391\"/>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"line2d_30\">\n",
       "     <path d=\"M 52.78125 49.654688 \n",
       "L 62.78125 49.654688 \n",
       "L 72.78125 49.654688 \n",
       "\" style=\"fill: none; stroke-dasharray: 9.6,2.4,1.5,2.4; stroke-dashoffset: 0; stroke: #008000; stroke-width: 1.5\"/>\n",
       "    </g>\n",
       "    <g id=\"text_17\">\n",
       "     <!-- mean 3, std 1 -->\n",
       "     <g transform=\"translate(80.78125 53.154688) scale(0.1 -0.1)\">\n",
       "      <use xlink:href=\"#DejaVuSans-6d\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-65\" x=\"97.412109\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-61\" x=\"158.935547\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-6e\" x=\"220.214844\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-20\" x=\"283.59375\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-33\" x=\"315.380859\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-2c\" x=\"379.003906\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-20\" x=\"410.791016\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-73\" x=\"442.578125\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-74\" x=\"494.677734\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-64\" x=\"533.886719\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-20\" x=\"597.363281\"/>\n",
       "      <use xlink:href=\"#DejaVuSans-31\" x=\"629.150391\"/>\n",
       "     </g>\n",
       "    </g>\n",
       "   </g>\n",
       "  </g>\n",
       " </g>\n",
       " <defs>\n",
       "  <clipPath id=\"pd34fc29185\">\n",
       "   <rect x=\"43.78125\" y=\"7.2\" width=\"251.1\" height=\"138.6\"/>\n",
       "  </clipPath>\n",
       " </defs>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<Figure size 450x250 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Use NumPy again for visualization\n",
    "x = np.arange(-7, 7, 0.01)\n",
    "\n",
    "# Mean and standard deviation pairs\n",
    "params = [(0, 1), (0, 2), (3, 1)]\n",
    "d2l.plot(x, [normal(x, mu, sigma) for mu, sigma in params], xlabel='x',\n",
    "         ylabel='p(x)', figsize=(4.5, 2.5),\n",
    "         legend=[f'mean {mu}, std {sigma}' for mu, sigma in params])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa40d5c8",
   "metadata": {
    "origin_pos": 19
   },
   "source": [
    "Note that changing the mean corresponds\n",
    "to a shift along the $x$-axis,\n",
    "and increasing the variance\n",
    "spreads the distribution out,\n",
    "lowering its peak.\n",
    "\n",
    "One way to motivate linear regression with squared loss\n",
    "is to assume that observations arise from noisy measurements,\n",
    "where the noise $\\epsilon$ follows the normal distribution \n",
    "$\\mathcal{N}(0, \\sigma^2)$:\n",
    "\n",
    "$$y = \\mathbf{w}^\\top \\mathbf{x} + b + \\epsilon \\textrm{ where } \\epsilon \\sim \\mathcal{N}(0, \\sigma^2).$$\n",
    "\n",
    "Thus, we can now write out the *likelihood*\n",
    "of seeing a particular $y$ for a given $\\mathbf{x}$ via\n",
    "\n",
    "$$P(y \\mid \\mathbf{x}) = \\frac{1}{\\sqrt{2 \\pi \\sigma^2}} \\exp\\left(-\\frac{1}{2 \\sigma^2} (y - \\mathbf{w}^\\top \\mathbf{x} - b)^2\\right).$$\n",
    "\n",
    "As such, the likelihood factorizes.\n",
    "According to *the principle of maximum likelihood*,\n",
    "the best values of parameters $\\mathbf{w}$ and $b$ are those\n",
    "that maximize the *likelihood* of the entire dataset:\n",
    "\n",
    "$$P(\\mathbf y \\mid \\mathbf X) = \\prod_{i=1}^{n} p(y^{(i)} \\mid \\mathbf{x}^{(i)}).$$\n",
    "\n",
    "The equality follows since all pairs $(\\mathbf{x}^{(i)}, y^{(i)})$\n",
    "were drawn independently of each other.\n",
    "Estimators chosen according to the principle of maximum likelihood\n",
    "are called *maximum likelihood estimators*.\n",
    "While, maximizing the product of many exponential functions,\n",
    "might look difficult,\n",
    "we can simplify things significantly, without changing the objective,\n",
    "by maximizing the logarithm of the likelihood instead.\n",
    "For historical reasons, optimizations are more often expressed\n",
    "as minimization rather than maximization.\n",
    "So, without changing anything,\n",
    "we can *minimize* the *negative log-likelihood*,\n",
    "which we can express as follows:\n",
    "\n",
    "$$-\\log P(\\mathbf y \\mid \\mathbf X) = \\sum_{i=1}^n \\frac{1}{2} \\log(2 \\pi \\sigma^2) + \\frac{1}{2 \\sigma^2} \\left(y^{(i)} - \\mathbf{w}^\\top \\mathbf{x}^{(i)} - b\\right)^2.$$\n",
    "\n",
    "If we assume that $\\sigma$ is fixed,\n",
    "we can ignore the first term,\n",
    "because it does not depend on $\\mathbf{w}$ or $b$.\n",
    "The second term is identical\n",
    "to the squared error loss introduced earlier,\n",
    "except for the multiplicative constant $\\frac{1}{\\sigma^2}$.\n",
    "Fortunately, the solution does not depend on $\\sigma$ either.\n",
    "It follows that minimizing the mean squared error\n",
    "is equivalent to the maximum likelihood estimation\n",
    "of a linear model under the assumption of additive Gaussian noise.\n",
    "\n",
    "\n",
    "## Linear Regression as a Neural Network\n",
    "\n",
    "While linear models are not sufficiently rich\n",
    "to express the many complicated networks\n",
    "that we will introduce in this book,\n",
    "(artificial) neural networks are rich enough\n",
    "to subsume linear models as networks\n",
    "in which every feature is represented by an input neuron,\n",
    "all of which are connected directly to the output.\n",
    "\n",
    ":numref:`fig_single_neuron` depicts\n",
    "linear regression as a neural network.\n",
    "The diagram highlights the connectivity pattern,\n",
    "such as how each input is connected to the output,\n",
    "but not the specific values taken by the weights or biases.\n",
    "\n",
    "![Linear regression is a single-layer neural network.](../img/singleneuron.svg)\n",
    ":label:`fig_single_neuron`\n",
    "\n",
    "The inputs are $x_1, \\ldots, x_d$.\n",
    "We refer to $d$ as the *number of inputs*\n",
    "or the *feature dimensionality* in the input layer.\n",
    "The output of the network is $o_1$.\n",
    "Because we are just trying to predict\n",
    "a single numerical value,\n",
    "we have only one output neuron.\n",
    "Note that the input values are all *given*.\n",
    "There is just a single *computed* neuron.\n",
    "In summary, we can think of linear regression\n",
    "as a single-layer fully connected neural network.\n",
    "We will encounter networks\n",
    "with far more layers\n",
    "in later chapters.\n",
    "\n",
    "### Biology\n",
    "\n",
    "Because linear regression predates computational neuroscience,\n",
    "it might seem anachronistic to describe\n",
    "linear regression in terms of neural networks.\n",
    "Nonetheless, they were a natural place to start\n",
    "when the cyberneticists and neurophysiologists\n",
    "Warren McCulloch and Walter Pitts began to develop\n",
    "models of artificial neurons.\n",
    "Consider the cartoonish picture\n",
    "of a biological neuron in :numref:`fig_Neuron`,\n",
    "consisting of *dendrites* (input terminals),\n",
    "the *nucleus* (CPU), the *axon* (output wire),\n",
    "and the *axon terminals* (output terminals),\n",
    "enabling connections to other neurons via *synapses*.\n",
    "\n",
    "![The real neuron (source: \"Anatomy and Physiology\" by the US National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) Program).](../img/neuron.svg)\n",
    ":label:`fig_Neuron`\n",
    "\n",
    "Information $x_i$ arriving from other neurons\n",
    "(or environmental sensors) is received in the dendrites.\n",
    "In particular, that information is weighted\n",
    "by *synaptic weights* $w_i$,\n",
    "determining the effect of the inputs,\n",
    "e.g., activation or inhibition via the product $x_i w_i$.\n",
    "The weighted inputs arriving from multiple sources\n",
    "are aggregated in the nucleus\n",
    "as a weighted sum $y = \\sum_i x_i w_i + b$,\n",
    "possibly subject to some nonlinear postprocessing via a function $\\sigma(y)$.\n",
    "This information is then sent via the axon to the axon terminals,\n",
    "where it reaches its destination\n",
    "(e.g., an actuator such as a muscle)\n",
    "or it is fed into another neuron via its dendrites.\n",
    "\n",
    "Certainly, the high-level idea that many such units\n",
    "could be combined, provided they have the correct connectivity and learning algorithm,\n",
    "to produce far more interesting and complex behavior\n",
    "than any one neuron alone could express\n",
    "arises from our study of real biological neural systems.\n",
    "At the same time, most research in deep learning today\n",
    "draws inspiration from a much wider source.\n",
    "We invoke :citet:`Russell.Norvig.2016`\n",
    "who pointed out that although airplanes might have been *inspired* by birds,\n",
    "ornithology has not been the primary driver\n",
    "of aeronautics innovation for some centuries.\n",
    "Likewise, inspiration in deep learning these days\n",
    "comes in equal or greater measure\n",
    "from mathematics, linguistics, psychology,\n",
    "statistics, computer science, and many other fields.\n",
    "\n",
    "## Summary\n",
    "\n",
    "In this section, we introduced\n",
    "traditional linear regression,\n",
    "where the parameters of a linear function\n",
    "are chosen to minimize squared loss on the training set.\n",
    "We also motivated this choice of objective\n",
    "both via some practical considerations\n",
    "and through an interpretation\n",
    "of linear regression as maximimum likelihood estimation\n",
    "under an assumption of linearity and Gaussian noise.\n",
    "After discussing both computational considerations\n",
    "and connections to statistics,\n",
    "we showed how such linear models could be expressed\n",
    "as simple neural networks where the inputs\n",
    "are directly wired to the output(s).\n",
    "While we will soon move past linear models altogether,\n",
    "they are sufficient to introduce most of the components\n",
    "that all of our models require:\n",
    "parametric forms, differentiable objectives,\n",
    "optimization via minibatch stochastic gradient descent,\n",
    "and ultimately, evaluation on previously unseen data.\n",
    "\n",
    "\n",
    "\n",
    "## Exercises\n",
    "\n",
    "1. Assume that we have some data $x_1, \\ldots, x_n \\in \\mathbb{R}$. Our goal is to find a constant $b$ such that $\\sum_i (x_i - b)^2$ is minimized.\n",
    "    1. Find an analytic solution for the optimal value of $b$.\n",
    "    1. How does this problem and its solution relate to the normal distribution?\n",
    "    1. What if we change the loss from $\\sum_i (x_i - b)^2$ to $\\sum_i |x_i-b|$? Can you find the optimal solution for $b$?\n",
    "1. Prove that the affine functions that can be expressed by $\\mathbf{x}^\\top \\mathbf{w} + b$ are equivalent to linear functions on $(\\mathbf{x}, 1)$.\n",
    "1. Assume that you want to find quadratic functions of $\\mathbf{x}$, i.e., $f(\\mathbf{x}) = b + \\sum_i w_i x_i + \\sum_{j \\leq i} w_{ij} x_{i} x_{j}$. How would you formulate this in a deep network?\n",
    "1. Recall that one of the conditions for the linear regression problem to be solvable was that the design matrix $\\mathbf{X}^\\top \\mathbf{X}$ has full rank.\n",
    "    1. What happens if this is not the case?\n",
    "    1. How could you fix it? What happens if you add a small amount of coordinate-wise independent Gaussian noise to all entries of $\\mathbf{X}$?\n",
    "    1. What is the expected value of the design matrix $\\mathbf{X}^\\top \\mathbf{X}$ in this case?\n",
    "    1. What happens with stochastic gradient descent when $\\mathbf{X}^\\top \\mathbf{X}$ does not have full rank?\n",
    "1. Assume that the noise model governing the additive noise $\\epsilon$ is the exponential distribution. That is, $p(\\epsilon) = \\frac{1}{2} \\exp(-|\\epsilon|)$.\n",
    "    1. Write out the negative log-likelihood of the data under the model $-\\log P(\\mathbf y \\mid \\mathbf X)$.\n",
    "    1. Can you find a closed form solution?\n",
    "    1. Suggest a minibatch stochastic gradient descent algorithm to solve this problem. What could possibly go wrong (hint: what happens near the stationary point as we keep on updating the parameters)? Can you fix this?\n",
    "1. Assume that we want to design a neural network with two layers by composing two linear layers. That is, the output of the first layer becomes the input of the second layer. Why would such a naive composition not work?\n",
    "1. What happens if you want to use regression for realistic price estimation of houses or stock prices?\n",
    "    1. Show that the additive Gaussian noise assumption is not appropriate. Hint: can we have negative prices? What about fluctuations?\n",
    "    1. Why would regression to the logarithm of the price be much better, i.e., $y = \\log \\textrm{price}$?\n",
    "    1. What do you need to worry about when dealing with pennystock, i.e., stock with very low prices? Hint: can you trade at all possible prices? Why is this a bigger problem for cheap stock? For more information review the celebrated Black--Scholes model for option pricing :cite:`Black.Scholes.1973`.\n",
    "1. Suppose we want to use regression to estimate the *number* of apples sold in a grocery store.\n",
    "    1. What are the problems with a Gaussian additive noise model? Hint: you are selling apples, not oil.\n",
    "    1. The [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution) captures distributions over counts. It is given by $p(k \\mid \\lambda) = \\lambda^k e^{-\\lambda}/k!$. Here $\\lambda$ is the rate function and $k$ is the number of events you see. Prove that $\\lambda$ is the expected value of counts $k$.\n",
    "    1. Design a loss function associated with the Poisson distribution.\n",
    "    1. Design a loss function for estimating $\\log \\lambda$ instead.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b6c7418",
   "metadata": {
    "origin_pos": 21,
    "tab": [
     "pytorch"
    ]
   },
   "source": [
    "[Discussions](https://discuss.d2l.ai/t/258)\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  },
  "required_libs": []
 },
 "nbformat": 4,
 "nbformat_minor": 5
}